Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 10692 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 358 |
| Duplicate rows (%) | 3.3% |
| Total size in memory | 1.1 MiB |
| Average record size in memory | 104.0 B |
Variable types
| NUM | 9 |
|---|---|
| CAT | 4 |
Reproduction
| Analysis started | 2020-05-14 00:19:13.108045 |
|---|---|
| Analysis finished | 2020-05-14 00:19:29.505963 |
| Duration | 16.4 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Dataset has 358 (3.3%) duplicate rows | Duplicates |
fire insurance (R$) is highly correlated with rent amount (R$) | High correlation |
rent amount (R$) is highly correlated with fire insurance (R$) | High correlation |
total (R$) is highly correlated with hoa (R$) | High correlation |
hoa (R$) is highly correlated with total (R$) | High correlation |
area is highly skewed (γ1 = 69.59680369) | Skewed |
hoa (R$) is highly skewed (γ1 = 69.03938119) | Skewed |
property tax (R$) is highly skewed (γ1 = 96.01359411) | Skewed |
total (R$) is highly skewed (γ1 = 58.96080292) | Skewed |
parking spaces has 2683 (25.1%) zeros | Zeros |
hoa (R$) has 2373 (22.2%) zeros | Zeros |
property tax (R$) has 1596 (14.9%) zeros | Zeros |
city
Categorical
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 83.5 KiB |
| São Paulo | |
|---|---|
| Rio de Janeiro | |
| Belo Horizonte | |
| Porto Alegre | |
| Campinas | 853 |
| Value | Count | Frequency (%) | |
| São Paulo | 5887 | 55.1% | |
| Rio de Janeiro | 1501 | 14.0% | |
| Belo Horizonte | 1258 | 11.8% | |
| Porto Alegre | 1193 | 11.2% | |
| Campinas | 853 | 8.0% |
Length
| Max length | 14 |
|---|---|
| Median length | 9 |
| Mean length | 10.54517396 |
| Min length | 8 |
| Distinct count | 517 |
|---|---|
| Unique (%) | 4.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 149.21791994014217 |
|---|---|
| Minimum | 11 |
| Maximum | 46335 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 11 |
|---|---|
| 5-th percentile | 30 |
| Q1 | 56 |
| median | 90 |
| Q3 | 182 |
| 95-th percentile | 400 |
| Maximum | 46335 |
| Range | 46324 |
| Interquartile range (IQR) | 126 |
Descriptive statistics
| Standard deviation | 537.0169423 |
|---|---|
| Coefficient of variation (CV) | 3.598877015 |
| Kurtosis | 5548.308334 |
| Mean | 149.2179199 |
| Median Absolute Deviation (MAD) | 45 |
| Skewness | 69.59680369 |
| Sum | 1595438 |
| Variance | 288387.1964 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 50 | 334 | 3.1% | |
| 70 | 329 | 3.1% | |
| 60 | 297 | 2.8% | |
| 100 | 253 | 2.4% | |
| 80 | 253 | 2.4% | |
| 40 | 221 | 2.1% | |
| 90 | 209 | 2.0% | |
| 200 | 193 | 1.8% | |
| 45 | 189 | 1.8% | |
| 120 | 183 | 1.7% | |
| Other values (507) | 8231 | 77.0% |
| Value | Count | Frequency (%) | |
| 11 | 1 | < 0.1% | |
| 12 | 1 | < 0.1% | |
| 13 | 2 | < 0.1% | |
| 15 | 19 | 0.2% | |
| 16 | 16 | 0.1% |
| Value | Count | Frequency (%) | |
| 46335 | 1 | < 0.1% | |
| 24606 | 1 | < 0.1% | |
| 12732 | 1 | < 0.1% | |
| 2000 | 2 | < 0.1% | |
| 1600 | 2 | < 0.1% |
rooms
Real number (ℝ≥0)
| Distinct count | 11 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.506079311634867 |
|---|---|
| Minimum | 1 |
| Maximum | 13 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.171266254 |
|---|---|
| Coefficient of variation (CV) | 0.4673699865 |
| Kurtosis | 1.487658631 |
| Mean | 2.506079312 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.7023905761 |
| Sum | 26795 |
| Variance | 1.371864638 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3 | 3269 | 30.6% | |
| 2 | 2975 | 27.8% | |
| 1 | 2454 | 23.0% | |
| 4 | 1586 | 14.8% | |
| 5 | 288 | 2.7% | |
| 6 | 68 | 0.6% | |
| 7 | 36 | 0.3% | |
| 8 | 11 | 0.1% | |
| 10 | 3 | < 0.1% | |
| 13 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 2454 | 23.0% | |
| 2 | 2975 | 27.8% | |
| 3 | 3269 | 30.6% | |
| 4 | 1586 | 14.8% | |
| 5 | 288 | 2.7% |
| Value | Count | Frequency (%) | |
| 13 | 1 | < 0.1% | |
| 10 | 3 | < 0.1% | |
| 9 | 1 | < 0.1% | |
| 8 | 11 | 0.1% | |
| 7 | 36 | 0.3% |
bathroom
Real number (ℝ≥0)
| Distinct count | 10 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.2368125701459034 |
|---|---|
| Minimum | 1 |
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 10 |
| Range | 9 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.407198198 |
|---|---|
| Coefficient of variation (CV) | 0.6291086777 |
| Kurtosis | 1.134852401 |
| Mean | 2.23681257 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.213809657 |
| Sum | 23916 |
| Variance | 1.980206769 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 4301 | 40.2% | |
| 2 | 2910 | 27.2% | |
| 3 | 1474 | 13.8% | |
| 4 | 1111 | 10.4% | |
| 5 | 578 | 5.4% | |
| 6 | 215 | 2.0% | |
| 7 | 85 | 0.8% | |
| 8 | 11 | 0.1% | |
| 9 | 4 | < 0.1% | |
| 10 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 4301 | 40.2% | |
| 2 | 2910 | 27.2% | |
| 3 | 1474 | 13.8% | |
| 4 | 1111 | 10.4% | |
| 5 | 578 | 5.4% |
| Value | Count | Frequency (%) | |
| 10 | 3 | < 0.1% | |
| 9 | 4 | < 0.1% | |
| 8 | 11 | 0.1% | |
| 7 | 85 | 0.8% | |
| 6 | 215 | 2.0% |
| Distinct count | 11 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.6091470258136924 |
|---|---|
| Minimum | 0 |
| Maximum | 12 |
| Zeros | 2683 |
| Zeros (%) | 25.1% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 12 |
| Range | 12 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.589520724 |
|---|---|
| Coefficient of variation (CV) | 0.9878032885 |
| Kurtosis | 2.769074701 |
| Mean | 1.609147026 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.487534127 |
| Sum | 17205 |
| Variance | 2.526576131 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 3630 | 34.0% | |
| 0 | 2683 | 25.1% | |
| 2 | 2070 | 19.4% | |
| 3 | 968 | 9.1% | |
| 4 | 789 | 7.4% | |
| 5 | 230 | 2.2% | |
| 6 | 163 | 1.5% | |
| 8 | 123 | 1.2% | |
| 7 | 33 | 0.3% | |
| 10 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 2683 | 25.1% | |
| 1 | 3630 | 34.0% | |
| 2 | 2070 | 19.4% | |
| 3 | 968 | 9.1% | |
| 4 | 789 | 7.4% |
| Value | Count | Frequency (%) | |
| 12 | 1 | < 0.1% | |
| 10 | 2 | < 0.1% | |
| 8 | 123 | 1.2% | |
| 7 | 33 | 0.3% | |
| 6 | 163 | 1.5% |
floor
Categorical
| Distinct count | 35 |
|---|---|
| Unique (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 83.5 KiB |
| - | |
|---|---|
| 1 | |
| 2 | |
| 3 | |
| 4 | 748 |
| Other values (30) |
| Value | Count | Frequency (%) | |
| - | 2461 | 23.0% | |
| 1 | 1081 | 10.1% | |
| 2 | 985 | 9.2% | |
| 3 | 931 | 8.7% | |
| 4 | 748 | 7.0% | |
| 5 | 600 | 5.6% | |
| 6 | 539 | 5.0% | |
| 7 | 497 | 4.6% | |
| 8 | 490 | 4.6% | |
| 9 | 369 | 3.5% | |
| Other values (25) | 1991 | 18.6% |
Length
| Max length | 3 |
|---|---|
| Median length | 1 |
| Mean length | 1.18630752 |
| Min length | 1 |
animal
Categorical
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 83.5 KiB |
| acept | |
|---|---|
| not acept |
| Value | Count | Frequency (%) | |
| acept | 8316 | 77.8% | |
| not acept | 2376 | 22.2% |
Length
| Max length | 9 |
|---|---|
| Median length | 5 |
| Mean length | 5.888888889 |
| Min length | 5 |
furniture
Categorical
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 83.5 KiB |
| not furnished | |
|---|---|
| furnished |
| Value | Count | Frequency (%) | |
| not furnished | 8086 | 75.6% | |
| furnished | 2606 | 24.4% |
Length
| Max length | 13 |
|---|---|
| Median length | 13 |
| Mean length | 12.02506547 |
| Min length | 9 |
| Distinct count | 1679 |
|---|---|
| Unique (%) | 15.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1174.0216984661429 |
|---|---|
| Minimum | 0 |
| Maximum | 1117000 |
| Zeros | 2373 |
| Zeros (%) | 22.2% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 170 |
| median | 560 |
| Q3 | 1237.5 |
| 95-th percentile | 3167.45 |
| Maximum | 1117000 |
| Range | 1117000 |
| Interquartile range (IQR) | 1067.5 |
Descriptive statistics
| Standard deviation | 15592.30525 |
|---|---|
| Coefficient of variation (CV) | 13.28110483 |
| Kurtosis | 4912.249106 |
| Mean | 1174.021698 |
| Median Absolute Deviation (MAD) | 550 |
| Skewness | 69.03938119 |
| Sum | 12552640 |
| Variance | 243119983 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 2373 | 22.2% | |
| 400 | 177 | 1.7% | |
| 300 | 168 | 1.6% | |
| 500 | 164 | 1.5% | |
| 600 | 141 | 1.3% | |
| 450 | 140 | 1.3% | |
| 350 | 137 | 1.3% | |
| 700 | 131 | 1.2% | |
| 1000 | 125 | 1.2% | |
| 2000 | 114 | 1.1% | |
| Other values (1669) | 7022 | 65.7% |
| Value | Count | Frequency (%) | |
| 0 | 2373 | 22.2% | |
| 1 | 30 | 0.3% | |
| 3 | 1 | < 0.1% | |
| 10 | 1 | < 0.1% | |
| 15 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1117000 | 2 | < 0.1% | |
| 220000 | 1 | < 0.1% | |
| 200000 | 1 | < 0.1% | |
| 81150 | 1 | < 0.1% | |
| 32000 | 1 | < 0.1% |
| Distinct count | 1195 |
|---|---|
| Unique (%) | 11.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3896.247194163861 |
|---|---|
| Minimum | 450 |
| Maximum | 45000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 450 |
|---|---|
| 5-th percentile | 859.1 |
| Q1 | 1530 |
| median | 2661 |
| Q3 | 5000 |
| 95-th percentile | 12000 |
| Maximum | 45000 |
| Range | 44550 |
| Interquartile range (IQR) | 3470 |
Descriptive statistics
| Standard deviation | 3408.545518 |
|---|---|
| Coefficient of variation (CV) | 0.8748278402 |
| Kurtosis | 4.62422818 |
| Mean | 3896.247194 |
| Median Absolute Deviation (MAD) | 1361 |
| Skewness | 1.838877304 |
| Sum | 41658675 |
| Variance | 11618182.55 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2500 | 258 | 2.4% | |
| 2000 | 244 | 2.3% | |
| 1200 | 237 | 2.2% | |
| 3000 | 235 | 2.2% | |
| 15000 | 231 | 2.2% | |
| 3500 | 216 | 2.0% | |
| 1800 | 215 | 2.0% | |
| 1500 | 211 | 2.0% | |
| 4000 | 202 | 1.9% | |
| 2200 | 201 | 1.9% | |
| Other values (1185) | 8442 | 79.0% |
| Value | Count | Frequency (%) | |
| 450 | 1 | < 0.1% | |
| 460 | 1 | < 0.1% | |
| 500 | 31 | 0.3% | |
| 503 | 1 | < 0.1% | |
| 505 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 45000 | 1 | < 0.1% | |
| 30000 | 1 | < 0.1% | |
| 25000 | 1 | < 0.1% | |
| 24000 | 1 | < 0.1% | |
| 20000 | 5 | < 0.1% |
| Distinct count | 1243 |
|---|---|
| Unique (%) | 11.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 366.70435839880287 |
|---|---|
| Minimum | 0 |
| Maximum | 313700 |
| Zeros | 1596 |
| Zeros (%) | 14.9% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 38 |
| median | 125 |
| Q3 | 375 |
| 95-th percentile | 1342.8 |
| Maximum | 313700 |
| Range | 313700 |
| Interquartile range (IQR) | 337 |
Descriptive statistics
| Standard deviation | 3107.832321 |
|---|---|
| Coefficient of variation (CV) | 8.475035134 |
| Kurtosis | 9667.782564 |
| Mean | 366.7043584 |
| Median Absolute Deviation (MAD) | 121 |
| Skewness | 96.01359411 |
| Sum | 3920803 |
| Variance | 9658621.736 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 1596 | 14.9% | |
| 100 | 181 | 1.7% | |
| 50 | 171 | 1.6% | |
| 84 | 145 | 1.4% | |
| 250 | 131 | 1.2% | |
| 42 | 115 | 1.1% | |
| 167 | 105 | 1.0% | |
| 25 | 103 | 1.0% | |
| 59 | 98 | 0.9% | |
| 67 | 98 | 0.9% | |
| Other values (1233) | 7949 | 74.3% |
| Value | Count | Frequency (%) | |
| 0 | 1596 | 14.9% | |
| 1 | 26 | 0.2% | |
| 2 | 4 | < 0.1% | |
| 3 | 12 | 0.1% | |
| 4 | 12 | 0.1% |
| Value | Count | Frequency (%) | |
| 313700 | 1 | < 0.1% | |
| 28120 | 1 | < 0.1% | |
| 21880 | 1 | < 0.1% | |
| 12500 | 1 | < 0.1% | |
| 10830 | 1 | < 0.1% |
| Distinct count | 216 |
|---|---|
| Unique (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 53.300879161990274 |
|---|---|
| Minimum | 3 |
| Maximum | 677 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 21 |
| median | 36 |
| Q3 | 68 |
| 95-th percentile | 160 |
| Maximum | 677 |
| Range | 674 |
| Interquartile range (IQR) | 47 |
Descriptive statistics
| Standard deviation | 47.76803093 |
|---|---|
| Coefficient of variation (CV) | 0.8961959292 |
| Kurtosis | 5.934963027 |
| Mean | 53.30087916 |
| Median Absolute Deviation (MAD) | 18 |
| Skewness | 1.970399756 |
| Sum | 569893 |
| Variance | 2281.784779 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 16 | 300 | 2.8% | |
| 20 | 291 | 2.7% | |
| 26 | 270 | 2.5% | |
| 22 | 256 | 2.4% | |
| 14 | 248 | 2.3% | |
| 17 | 248 | 2.3% | |
| 23 | 245 | 2.3% | |
| 13 | 239 | 2.2% | |
| 18 | 220 | 2.1% | |
| 19 | 214 | 2.0% | |
| Other values (206) | 8161 | 76.3% |
| Value | Count | Frequency (%) | |
| 3 | 2 | < 0.1% | |
| 4 | 2 | < 0.1% | |
| 5 | 5 | < 0.1% | |
| 6 | 10 | 0.1% | |
| 7 | 53 | 0.5% |
| Value | Count | Frequency (%) | |
| 677 | 1 | < 0.1% | |
| 451 | 1 | < 0.1% | |
| 376 | 1 | < 0.1% | |
| 338 | 1 | < 0.1% | |
| 305 | 1 | < 0.1% |
| Distinct count | 5751 |
|---|---|
| Unique (%) | 53.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5490.4869996258885 |
|---|---|
| Minimum | 499 |
| Maximum | 1120000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 83.5 KiB |
Quantile statistics
| Minimum | 499 |
|---|---|
| 5-th percentile | 1128.55 |
| Q1 | 2061.75 |
| median | 3581.5 |
| Q3 | 6768 |
| 95-th percentile | 15164.5 |
| Maximum | 1120000 |
| Range | 1119501 |
| Interquartile range (IQR) | 4706.25 |
Descriptive statistics
| Standard deviation | 16484.72591 |
|---|---|
| Coefficient of variation (CV) | 3.002415981 |
| Kurtosis | 3926.019305 |
| Mean | 5490.487 |
| Median Absolute Deviation (MAD) | 1842.5 |
| Skewness | 58.96080292 |
| Sum | 58704287 |
| Variance | 271746188.4 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2555 | 39 | 0.4% | |
| 2633 | 25 | 0.2% | |
| 4089 | 21 | 0.2% | |
| 1219 | 15 | 0.1% | |
| 760 | 12 | 0.1% | |
| 1572 | 11 | 0.1% | |
| 1117 | 11 | 0.1% | |
| 2586 | 10 | 0.1% | |
| 1431 | 10 | 0.1% | |
| 10840 | 10 | 0.1% | |
| Other values (5741) | 10528 | 98.5% |
| Value | Count | Frequency (%) | |
| 499 | 1 | < 0.1% | |
| 507 | 2 | < 0.1% | |
| 508 | 1 | < 0.1% | |
| 509 | 1 | < 0.1% | |
| 545 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1120000 | 2 | < 0.1% | |
| 316900 | 1 | < 0.1% | |
| 233200 | 1 | < 0.1% | |
| 222100 | 1 | < 0.1% | |
| 95610 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| city | area | rooms | bathroom | parking spaces | floor | animal | furniture | hoa (R$) | rent amount (R$) | property tax (R$) | fire insurance (R$) | total (R$) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | São Paulo | 70 | 2 | 1 | 1 | 7 | acept | furnished | 2065 | 3300 | 211 | 42 | 5618 |
| 1 | São Paulo | 320 | 4 | 4 | 0 | 20 | acept | not furnished | 1200 | 4960 | 1750 | 63 | 7973 |
| 2 | Porto Alegre | 80 | 1 | 1 | 1 | 6 | acept | not furnished | 1000 | 2800 | 0 | 41 | 3841 |
| 3 | Porto Alegre | 51 | 2 | 1 | 0 | 2 | acept | not furnished | 270 | 1112 | 22 | 17 | 1421 |
| 4 | São Paulo | 25 | 1 | 1 | 0 | 1 | not acept | not furnished | 0 | 800 | 25 | 11 | 836 |
| 5 | São Paulo | 376 | 3 | 3 | 7 | - | acept | not furnished | 0 | 8000 | 834 | 121 | 8955 |
| 6 | Rio de Janeiro | 72 | 2 | 1 | 0 | 7 | acept | not furnished | 740 | 1900 | 85 | 25 | 2750 |
| 7 | São Paulo | 213 | 4 | 4 | 4 | 4 | acept | not furnished | 2254 | 3223 | 1735 | 41 | 7253 |
| 8 | São Paulo | 152 | 2 | 2 | 1 | 3 | acept | furnished | 1000 | 15000 | 250 | 191 | 16440 |
| 9 | Rio de Janeiro | 35 | 1 | 1 | 0 | 2 | acept | furnished | 590 | 2300 | 35 | 30 | 2955 |
Last rows
| city | area | rooms | bathroom | parking spaces | floor | animal | furniture | hoa (R$) | rent amount (R$) | property tax (R$) | fire insurance (R$) | total (R$) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10682 | Porto Alegre | 160 | 3 | 2 | 3 | 4 | acept | furnished | 850 | 3300 | 220 | 49 | 4419 |
| 10683 | São Paulo | 280 | 4 | 4 | 2 | 5 | acept | not furnished | 4200 | 4000 | 1042 | 51 | 9293 |
| 10684 | Rio de Janeiro | 98 | 2 | 1 | 0 | 1 | acept | not furnished | 560 | 3900 | 184 | 51 | 4695 |
| 10685 | São Paulo | 83 | 3 | 2 | 2 | 11 | acept | not furnished | 888 | 7521 | 221 | 96 | 8726 |
| 10686 | São Paulo | 150 | 3 | 3 | 2 | 8 | not acept | furnished | 0 | 13500 | 0 | 172 | 13670 |
| 10687 | Porto Alegre | 63 | 2 | 1 | 1 | 5 | not acept | furnished | 402 | 1478 | 24 | 22 | 1926 |
| 10688 | São Paulo | 285 | 4 | 4 | 4 | 17 | acept | not furnished | 3100 | 15000 | 973 | 191 | 19260 |
| 10689 | Rio de Janeiro | 70 | 3 | 3 | 0 | 8 | not acept | furnished | 980 | 6000 | 332 | 78 | 7390 |
| 10690 | Rio de Janeiro | 120 | 2 | 2 | 2 | 8 | acept | furnished | 1585 | 12000 | 279 | 155 | 14020 |
| 10691 | São Paulo | 80 | 2 | 1 | 0 | - | acept | not furnished | 0 | 1400 | 165 | 22 | 1587 |
Most frequent
| city | area | rooms | bathroom | parking spaces | floor | animal | furniture | hoa (R$) | rent amount (R$) | property tax (R$) | fire insurance (R$) | total (R$) | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 80 | Porto Alegre | 47 | 1 | 1 | 1 | 1 | not acept | furnished | 400 | 2200 | 0 | 33 | 2633 | 22 |
| 153 | São Paulo | 20 | 1 | 1 | 0 | - | acept | furnished | 602 | 1800 | 130 | 23 | 2555 | 14 |
| 201 | São Paulo | 45 | 1 | 1 | 1 | 1 | not acept | furnished | 3000 | 5520 | 0 | 70 | 8590 | 9 |
| 160 | São Paulo | 20 | 1 | 1 | 0 | 5 | acept | furnished | 602 | 1800 | 130 | 23 | 2555 | 7 |
| 187 | São Paulo | 35 | 1 | 1 | 0 | 1 | not acept | not furnished | 250 | 1305 | 0 | 17 | 1572 | 7 |
| 195 | São Paulo | 40 | 1 | 1 | 0 | - | not acept | not furnished | 0 | 780 | 17 | 12 | 809 | 7 |
| 186 | São Paulo | 35 | 1 | 1 | 0 | - | acept | not furnished | 0 | 1100 | 30 | 14 | 1144 | 6 |
| 211 | São Paulo | 50 | 1 | 1 | 0 | - | not acept | not furnished | 0 | 1250 | 34 | 19 | 1303 | 6 |
| 68 | Campinas | 110 | 3 | 3 | 2 | - | acept | not furnished | 560 | 3200 | 88 | 49 | 3897 | 5 |
| 87 | Rio de Janeiro | 15 | 1 | 1 | 0 | - | acept | not furnished | 0 | 700 | 0 | 10 | 710 | 5 |